---
title: "R_code_final"
author: "Maximilian Zeyda"
date: "4/26/2020"
output: html_document
---

## Load libraries

```{r}
install.packages("rbin")
install.packages("devtools")
install.packages("farver")
install.packages("binr")
install.packages("psych")
install.packages("lmerTest")
install.packages('sjstats')
install.packages("ggpubr")
install.packages("lmTest")
install.packages("psycho")
library(lme4)
library(tidyverse)
library(rbin)
library(devtools)
library(ggplot2)
library(farver)
library(binr)
library(MASS)
library(psych)
library(Hmisc)
library(lmerTest)
library(irr)
library(tidyr)
library(sjstats)
library(dplyr)
library(ggpubr)
library(lmtest)
library(psycho)
```

## Load data

Dataset is imported into RStudio. The dataset was prepared with Python. _notscaled indicates that no z transformation (standardization) has yet been performed.

na.omit clears the data from every observation with at least one missing value

```{r}
data = read.csv("2022_04_06_notscaled.csv")
data <- na.omit(data)
summary(data)
```

## Factor control variables

Transform numeric control variables with values of 0 and 1 (gender, senior, manager) into categorial binary variables.

```{r}
data$gender <- factor(data$gender)
data$senior <- factor(data$senior)
data$manager <- factor(data$manager)
data$meeting_no <- factor(data$meeting_no)
data$participants <- scale(data$participants)
data$prod_cat <- factor(data$prod_cat, order = TRUE)
data$bpm_avg <- scale(data$bpm_avg)
data$bpm_var <- scale(data$bpm_var)
data$acc_avg <- scale(data$acc_avg)
data$acc_var <- scale(data$acc_var)
data$mic_avg <- scale(data$mic_avg)
data$mic_var <- scale(data$mic_var)
summary(data)
```

## Creating the correlation table ofthe predictor variables

Create a subset of the data set only including the predictor variables and create correlation table.

```{r}
cor_data <- data[, c(2:11)]
cor_data
cor_data$gender <- as.numeric(cor_data$gender)
cor_data$senior <- as.numeric(cor_data$senior)
cor_data$manager <- as.numeric(cor_data$manager)
cor_data$participants <- as.numeric(cor_data$participants)
res<- cor(cor_data, method = "pearson")
round(res,2)
res2 <- rcorr(as.matrix(cor_data))

```

Correlation between mic_avg und mic_var: 0.77 -> remove mic_var from model

## Plotting the outcome variable

```{r}
p <- ggplot(data, aes(x=prod_scale)) + 
 geom_histogram(aes(y=..density..), colour="black", fill="white")+
 geom_density(alpha=.2, fill="#FF6666")
p

```

Skewed distribution of outcome variable. Most ratings with a value of 50 (semi-productive), almost no meetings rated as unproductive (0)

## Calculate VIFs for a simple linear model with all predictor variables

```{r}
linear<- lm(prod_scale ~ 1 + bpm_avg + bpm_var + acc_avg + acc_var + mic_avg + gender + senior + manager + participants, data = data)
summary(linear)
car::vif(linear)
```

## Check for heteroscedasticity using the Breusch-Pagan test

```{r}
bptest(linear, data = data)
```

## Calculate the ICC(1) for the outcome variable with the null model

1) Baseline model is calculated and stored under null_lmer
2) ICC(1) is calculated

```{r}
null_lmer<- lmer(prod_scale ~ 1 + (1 | meeting_no), data = data)
performance::icc(null_lmer)
```


## Get regression results and BIC of Model 0 (Null Model)

```{r}
summary(null_lmer)
BIC(null_lmer)
```

## Get regression results and BIC of Model 1 (Fixed Control Predictors and Randomly Varying Intercepts)

```{r}
lm_control<- lmer(prod_scale ~ 1 + gender + senior + manager + participants + (1  | meeting_no), data = data)
summary(lm_control)
BIC(lm_control)
```

## Get regression results and BIC of Model 2 (Fixed Control and Body Signals Predictors with Randomly Varying Intercepts)

```{r}
lm_control_bod<- lmer(prod_scale ~ 1 + bpm_avg + bpm_var + acc_avg + acc_var + mic_avg + gender + senior + manager + participants + (1  | meeting_no), data = data)
summary(lm_control_bod)
BIC(lm_control_bod)
```

For Random Forest Classification Results, please refer to the Python code